The SQL++ Unifying Semi-structured Query Language, and an Expressiveness Benchmark of SQL-on-Hadoop, NoSQL and NewSQL Databases
نویسندگان
چکیده
SQL-on-Hadoop, NewSQL and NoSQL databases provide semi-structured data models (typically JSON based) and respective query languages. Lack of formal syntax and semantics, idiomatic (nonSQL) language constructs and large variations in syntax, semantics and actual capabilities pose problems even to database experts: It is hard to understand, compare and use these languages. It is especially tedious to write software that interoperates between two of them or an SQL database and one of them. Towards solving these problems, first we formally specify the syntax and semantics of SQL++. It consists of a semi-structured data model (which extends both JSON and the relational data model) and a query language that is fully backwards compatible with SQL. SQL++ is “unifying” in the sense that it is explicitly designed to encompass the data model and query language capabilities of current SQL-on-Hadoop, NoSQL and NewSQL databases. Then, we itemize fifteen SQL++ data model and query language features and benchmark eleven databases on their support of the multiple options associated with each feature, leading to feature matrices and commentary. Each feature matrix is the result of empirical validation through sample queries. This work was supported by NSF DC 0910820, NSF III 1219263, NSF IIS 1237174 and Informatica grants. The grants' PI is Prof. Papakonstantinou who is a shareholder of an entity that commercializes some results mentioned in this research. Kian Win Ong E-mail: [email protected] Yannis Papakonstantinou∗ E-mail: [email protected] Romain Vernoux E-mail: [email protected] Since SQL itself is a subset of SQL++, the SQL-aware reader will easily identify in which ways each of the surveyed databases provides more or less than SQL. The eleven databases are Hive, Jaql, Pig, Cassandra, JSONiq, MongoDB, Couchbase, SQL, AsterixDB, BigQuery and UnityJDBC. They were selected due to their market adoption or because they present cutting edge, advanced query language abilities. Finally, we briefly discuss the use of SQL++ as the query language of the FORWARD virtual database query processor, which executes SQL++ queries over SQL and non-SQL databases and the use of SQL++ in the FORWARD application framework, which enables rapid development of live reports and interactive applications on SQL and non-SQL databases. FORWARD provides a proof-of-concept of SQL++’s applicability as a unifying data model and query language.
منابع مشابه
Polystore Query Rewriting: The Challenges of Variety
Numerous databases marketed as SQL-on-Hadoop, NewSQL [16] and NoSQL have emerged to catalyze Big Data applications. These databases generally support the 3Vs [7]. (i) Volume: amount of data (ii) Velocity: speed of data in and out (iii) Variety: semi-structured and heterogeneous data. As a result of differing use cases and design considerations around the Variety requirement, these new databases...
متن کاملThe SQL++ Query Language: Configurable, Unifying and Semi-structured
NoSQL databases support semi-structured data, typically modeled as JSON. They also provide limited (but expanding) query languages. Their idiomatic, non-SQL language constructs, the many variations, and the lack of formal semantics inhibit deep understanding of the query languages, and also impede progress towards clean, powerful, declarative query languages. This paper specifies the syntax and...
متن کاملNewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management
One of the key advances in resolving the “big-data” problem has been the emergence of an alternative database technology. Today, classic RDBMS are complemented by a rich set of alternative Data Management Systems (DMS) specially designed to handle the volume, variety, velocity and variability of Big Data collections; these DMS include NoSQL, NewSQL and Search-based systems. NewSQL is a class of...
متن کاملRenaissance in Data Management Systems: SQL, NoSQL, and NewSQL∗
The recent emergence of a new class of systems for data management has challenged the well-entrenched relational databases. These systems provide several choices for data management under the umbrella term NoSQL. Making a right choice is critical to building applications that meet business needs. Performance, scalability and cost are the principal business drivers for these new systems. By desi...
متن کاملPerformance Analysis Of Scalable Sql And Nosql Databases : A Quantitative Approach
PERFORMANCE ANALYSIS OF SCALABLE SQL AND NOSQL DATABASES: AQUANTITATIVE APPROACHby HARISH BALASUBRAMANIANMay 2014Advisor: Dr.Weisong ShiMajor: Computer ScienceDegree: Master of Science Benchmarking is a common method in evaluating and choosing a NoSQL database.There are already lots of benchmarking reports available in internet and research papers. Most ofthe ben...
متن کامل